The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy versus EVO-rummy
نویسندگان
چکیده
Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Coevolution of the value function is also claimed to yield good results. This paper reports on a direct comparison between an agent trained to play gin rummy using temporal difference learning, and the same agent trained with co-evolution. Coevolution produced superior results.
منابع مشابه
Temporal-Difference Learning in Self-Play Training
Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Coevolution of the value function is also claimed to yield ...
متن کاملRainbow rummy: a web-based game for vocabulary acquisition using computer-directed speech
This paper describes a new on-line game we have developed which allows learners of Chinese or English to practice speaking in a communicative setting. Game play resembles gin rummy or Mah Jong, and is intended to be sufficiently engaging to invite persistent replay. Students compete in a social game against other students at remote settings, or they can play against a robotic partner. A user st...
متن کاملFeature Construction for Reinforcement Learning in Hearts
Temporal difference (TD) learning has been used to learn strong evaluation functions in a variety of two-player games. TD-gammon illustrated how the combination of game tree search and learning methods can achieve grand-master level play in backgammon. In this work, we develop a player for the game of hearts, a 4-player game, based on stochastic linear regression and TD learning. Using a small ...
متن کاملTemporal Difference Learning for Nondeterministic Board Games
We use temporal difference (TD) learning to train neural networks for four nondeterministic board games: backgammon, hypergammon, pachisi, and Parcheesi. We investigate the influence of two variables on the development of these networks: first, the source of training data, either learner-vs.self or learner-vs.-other game play; second, the choice of attributes used: a simple encoding of the boar...
متن کاملUniversity of Alberta Gradient Temporal - Difference Learning Algorithms
We present a new family of gradient temporal-difference (TD) learning methods with function approximation whose complexity, both in terms of memory and per-time-step computation, scales linearly with the number of learning parameters. TD methods are powerful prediction techniques, and with function approximation form a core part of modern reinforcement learning (RL). However, the most popular T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003